Efficient semi-supervised feature selection with noise insensitive trace ratio criterion

نویسندگان

  • Yun Liu
  • Feiping Nie
  • Wu Jigang
  • Lihui Chen
چکیده

Feature selection is an effective method to deal with high-dimensional data. While in many applications such as multimedia and web mining, the data are often high-dimensional and very large scale, but the labeled data are often very limited. On these kind of applications, it is important that the feature selection algorithm is efficient and can explore labeled data and unlabeled data simultaneously. In this paper, we target on this problem and propose an efficient semi-supervised feature selection algorithm to select relevant features using both labeled and unlabeled data. First, we analyze a popular trace ratio criterion in the dimensionality reduction, and point out that the trace ratio criterion tends to select features with very small variance. To solve this problem, we propose a noise insensitive trace ratio criterion for feature selection with a re-scale preprocessing. Interestingly, the feature selection with the noise insensitive trace ratio criterion can be much more efficiently solved. Based on the noise insensitive trace ratio criterion, we propose a new semi-supervised feature selection algorithm. The algorithm fully explores the distribution of the labeled and unlabeled data with a special label propagation method. Experimental results verify the effectiveness of the proposed algorithm, and show improvement over traditional supervised feature selection algorithms. & 2012 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction

Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new effic...

متن کامل

Semi-Supervised Fuzzy-Rough Feature Selection

With the continued and relentless growth in dataset sizes in recent times, feature or attribute selection has become a necessary step in tackling the resultant intractability. Indeed, as the number of dimensions increases, the number of corresponding data instances required in order to generate accurate models increases exponentially. Fuzzy-rough set-based feature selection techniques offer gre...

متن کامل

Infomation based supervised and semi-supervised feature selection

We merge the results from both of supervised and semi-supervised feature selection techniques. The method was applied to the five datasets from NIPS feature selection competition. As a preprocessing step, we firstly discretize each training dataset using EM algorithm. Then, we filter the discretized dataset based on the MI (mutual information) value of each feature with respect to the class var...

متن کامل

Trace Ratio Criterion for Feature Selection

Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expens...

متن کامل

Hypergraph Spectra for Semi-supervised Feature Selection

In many data analysis tasks, one is often confronted with the problem of selecting features from very high dimensional data. Most existing feature selection methods focus on ranking individual features based on a utility criterion, and select the optimal feature set in a greedy manner. However, the feature combinations found in this way do not give optimal classification performance, since they...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 105  شماره 

صفحات  -

تاریخ انتشار 2013